In [3]:
import numpy as np
from scipy.spatial.distance import cdist
from scipy.special import expit
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import make_pipeline, make_union
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_diabetes
The polynomial basis function is provided by scikit-learn
in the sklearn.preprocessing module.
In [4]:
X = np.arange(1, 9).reshape(4, 2)
X
Out[4]:
In [5]:
PolynomialFeatures(degree=2).fit_transform(X)
Out[5]:
Unfortunately, this is pretty much the extent of what scikit-learn
provides in the way of basis functions. Here we define some standard basis functions, while adhering to the scikit-learn
interface. This will be important when we try to incorporate our basis functions in pipelines and feature unions later on. While this is not strictly required, it will certainly make life easier for us down the road.
In [53]:
class RadialFeatures(BaseEstimator, TransformerMixin):
def __init__(self, mu=0, s=1):
self.mu = mu
self.s = s
def fit(self, X, y=None):
# this basis function stateless
# need only return self
return self
def transform(self, X, y=None):
return np.exp(-cdist(X, self.mu, 'sqeuclidean')/(2*self.s**2))
In [54]:
class SigmoidalFeatures(BaseEstimator, TransformerMixin):
def __init__(self, mu=0, s=1):
self.mu = mu
self.s = s
def fit(self, X, y=None):
# this basis function stateless
# need only return self
return self
def transform(self, X, y=None):
return expit(cdist(X, self.mu)/self.s)
In [8]:
mu = np.linspace(0.1, 1, 10).reshape(5, 2)
mu
Out[8]:
In [9]:
RadialFeatures(mu=mu).fit_transform(X).round(2)
Out[9]:
In [10]:
SigmoidalFeatures(mu=mu).fit_transform(X).round(2)
Out[10]:
Now that we have a few basis functions at our disposal, let's try to apply different basis functions to different features of a dataset. We use the diabetes dataset, a real-world dataset with 442 instances and 10 features. We first work through each step manually, and show how the steps can be combined using scikit-learn
's feature unions and pipelines to form a single model that will perform all the necessary steps in one fell swoop.
In [11]:
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target
In [12]:
X.shape
Out[12]:
In [13]:
y.shape
Out[13]:
We print every other feature for just the first few instances, just to get an idea of what the data looks like
In [14]:
# sanity check
X[:5, ::2]
Out[14]:
In [15]:
# sanity check
y[:5]
Out[15]:
Assume for some reason we are interested in training a model using, say, features 2 and 5 with a polynomial basis, and features 6, 8 and 9 with a radial basis. We first slice up our original dataset.
In [16]:
X1 = X[:, np.array([2, 5])]
X1.shape
Out[16]:
In [17]:
# sanity check
X1[:5]
Out[17]:
In [18]:
X2 = X[:, np.array([6, 8, 9])]
X2.shape
Out[18]:
In [19]:
# sanity check
X2[:5]
Out[19]:
Now we apply the respective basis functions.
In [20]:
X1_poly = PolynomialFeatures().fit_transform(X1)
X1_poly.shape
Out[20]:
In [21]:
# sanity check
X1_poly[:5].round(2)
Out[21]:
In [22]:
mu = np.linspace(0, 1, 6).reshape(2, 3)
mu
Out[22]:
In [23]:
X2_radial = RadialFeatures(mu).fit_transform(X2)
X2_radial.shape
Out[23]:
In [24]:
# sanity check
X2_radial[:5].round(2)
Out[24]:
Now we're ready to concatenate these augmented datasets.
In [25]:
X_concat = np.hstack((X1_poly, X2_radial))
X_concat.shape
Out[25]:
In [26]:
# sanity check
X_concat[:5, ::2].round(2)
Out[26]:
Now we are ready to train a regressor with this augmented dataset. For this example, we'll simply use a linear regression model.
In [27]:
model = LinearRegression()
model.fit(X_concat, y)
Out[27]:
In [28]:
model.score(X_concat, y)
Out[28]:
(To no one's surprise, our model performs quite poorly, since zero effort was made to identify and incorporate the most informative features or appropriate basis functions. Rather, they were chosen solely to maximize clarity of exposition.)
So let's recap what we've done.
X
X
to obtain 442 by 2 matrix X1
and X
to obtain 442 by 3 matrix X2
.X1
with 2 features and 442 samples. This returns a dataset X1_poly
with $\begin{pmatrix} 4 \\ 2 \end{pmatrix} = 6$ features and 442 samples. (NB: In general, the number of output features for a polynomial basis function of degree $d$ on $n$ features is the number of multisets of cardinality $d$, with elements taken from a finite set of cardinality $n+1$, which is given by the multiset coefficient $\begin{pmatrix} \begin{pmatrix} n + 1 \\ d \end{pmatrix} \end{pmatrix} = \begin{pmatrix} n + d \\ d \end{pmatrix}$.) So from 442 by 2 matrix X1
we obtain 442 by 6 matrix X1_poly
mu
. From the 442 by 3 matrix X2
, we obtain 442 by 2 matrix X2_radial
X1_poly
with 442 by 2 matrix X2_radial
to obtain the final 442 by 8 matrix X_concat
X_concat
.So this is how we went from a 442x10 matrix X
to the 442x8 matrix X_concat
.
First we define a transformer that slices up the input data. Note instead of working with (tuples of) slice objects, it is usually more convenient to use the Numpy function np.index_exp
. We explain later why this is necessary.
In [47]:
class ArraySlicer(BaseEstimator, TransformerMixin):
def __init__(self, index_exp):
self.index_exp = index_exp
def fit(self, X, y=None):
return self
def transform(self, X, y=None):
return X[self.index_exp]
In [48]:
model = \
make_pipeline(
make_union(
make_pipeline(
ArraySlicer(np.index_exp[:, np.array([2, 5])]),
PolynomialFeatures()
),
make_pipeline(
ArraySlicer(np.index_exp[:, np.array([6, 8, 9])]),
RadialFeatures(mu)
)
)
)
In [49]:
model.fit(X)
Out[49]:
In [50]:
model.transform(X).shape
Out[50]:
In [51]:
# sanity check
model.transform(X)[:5, ::2].round(2)
Out[51]:
This effectively composes each of the steps we had to manually perform and amalgamated it into a single transformer. We can even append a regressor at the end to make it a complete estimator/predictor.
In [34]:
model = \
make_pipeline(
make_union(
make_pipeline(
ArraySlicer(np.index_exp[:, np.array([2, 5])]),
PolynomialFeatures()
),
make_pipeline(
ArraySlicer(np.index_exp[:, np.array([6, 8, 9])]),
RadialFeatures(mu)
)
),
LinearRegression()
)
In [35]:
model.fit(X, y)
Out[35]:
In [36]:
model.score(X, y)
Out[36]:
The most important thing to note is that everything in scikit-learn
is either a transformer or a predictor, and are almost always an estimator. An estimator is simply a class that implements the fit
method, while a transfromer and predictor implements a, well, transform
and predict
method respectively. From this simple interface, we get a surprising hight amount of functionality and flexibility.
A pipeline behaves as a transformer or a predictor depending on what the last step of the pipleline is. If the last step is a transformer, the entire pipeline is a transformer and one can call fit
, transform
or fit_transform
like an ordinary transformer. The same is true if the last step is a predictor. Essentially, all it does is chain the fit_transform
calls of every transformer in the pipeline. If we think of ordinary transformers like functions, pipelines can be thought of as a higher-order function that simply composes an arbitary number of functions.
In [55]:
model = \
make_pipeline(
PolynomialFeatures(), # transformer
LinearRegression() # predictor
)
In [56]:
model.fit(X, y)
Out[56]:
In [57]:
model.score(X, y)
Out[57]:
A union is a transformer that is initialized with an arbitrary number of transformers. When fit_transform
is called on a dataset, it simply calls fit_transform
of the transformers it was given and horizontally concatenates its results.
In [60]:
mu_ = np.linspace(0, 10, 30).reshape(3, 10)
In [61]:
model = \
make_union(
PolynomialFeatures(),
RadialFeatures(mu_)
)
If we run this on the original 442x10 dataset, we expect to get a dataset with the same number of samples and $\begin{pmatrix} 12 \\ 2 \end{pmatrix} + 3 = 66 + 3 = 69$ features.
In [62]:
model.fit_transform(X).shape
Out[62]:
The above union applies the basis functions on the entire dataset, but we're interested in applying different basis functions to different features. To do this, we can simply define a rather frivolous transformer that simply slices the input data, and that's exactly what ArraySlicer
was for.
In [67]:
model = \
make_pipeline(
ArraySlicer(np.index_exp[:, np.array([2, 5])]),
PolynomialFeatures()
)
In [68]:
model.fit(X)
Out[68]:
In [69]:
model.transform(X).shape
Out[69]:
In [70]:
# sanity check
model.transform(X)[:5].round(2)
Out[70]:
Then we can combine this all together to form our mega-transformer which we showed earlier.
In [72]:
model = \
make_pipeline(
make_union(
make_pipeline(
ArraySlicer(np.index_exp[:, np.array([2, 5])]),
PolynomialFeatures()
),
make_pipeline(
ArraySlicer(np.index_exp[:, np.array([6, 8, 9])]),
RadialFeatures(mu)
)
),
LinearRegression()
)
This gives us a predictor which takes some input, slices up the respective features, churns it through a basis function and finally trains a linear regressor on it, all in one go!
In [73]:
model.fit(X, y)
Out[73]:
In [74]:
model.score(X, y)
Out[74]:
In [ ]: